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ABSTRACT 

This paper introduces the idea of enhancing the audio presentation of 
a multimedia museum guide by using the PDA screen to travel throughout a fresco and 
identify the various details in it. During the presentation, a sequence of pictures is 
synchronized with the audio commentary, and the transitions among the pictures are 
planned according to cinematic techniques . The theoretical background is presented, 
discussing the language ^^of cinematography and the Rhetorical Structure Theory to 
analyze dependency relationships inside a text. In building the video clips, a set of 
strategies similar to those used in documentaries was employed. Two broad classes of 
strategies were identified. The first class encompasses constraints imposed by the 
grammar of cinematography, while the second deals with conventions normally used in 
guiding camera movements in the production of documentaries . The results of a 
preliminary evaluation are also presented and discussed. Includes five figures. 
(Contains 11 references.) (Author) 
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Abstract 



TO THE EDUCATIONAL RESOURCES 

In this paper we introduce the idea of enhancing the audio presentation of a information center (eriC) 

multimedia museum guide by using the PDA screen to travel throughout a l 

fresco and identify the various details in it During the presentation, a sequence 

of pictures is synchronized with the audio commentary, and the transitions 

among the pictures are planned according to cinematic techniques. 



The theoretical background is presented, discussing the language of 
cinematography and the Rhetorical Structure Theory to analyze dependency 
relationships inside a text. In building the video clips, a set of strategies similar 
to those used in documentaries was employed. Two broad classes of 
strategies have been identified. The first class encompasses constraints 
imposed by the grammar of cinematography, while the second deals with 
conventions normally used in guiding camera movements in the production of 
documentaries. 

The results of a preliminary evaluation are also presented and discussed. 
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1, Introduction 

Many research projects are exploring the new possibilities offered by Personal Digital 
Assistants (PDAs) in a museum setting (for example, see Grinter et al, 2002, 
Cheverst 2000 and Not et al., 1998). Usually, these multimedia guides use static 
images, while others employ pre-recorded short video clips about museum exhibits. 
In a previous work (Not et al, 1998, 2000), we explored different techniques to 
automatically build multimedia, location-aware presentations in a museum setting. 
The advent of more powerful devices has allowed researchers to experiment with 
new forms of multimedia, In particular time-based media such as animations. 
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In this paper we introduce the idea of enhancing the audio presentation (dynamically 
assembled pre-recorded or synthesized speech) of a complex fresco by using the 
PDA screen to travel throughout the fresco itself and identify details. At presentation 
time, a sequence of pictures is synchronized with the audio commentary, and the 
transitions among them are planned according to cinematic techniques. Our 
hypothesis is that the use of this type of animation to present the description of a 
painting allows the visitor to better identify the details introduced by the audio 
counterpart of the presentation. In this manner, both the efficiency and the 
satisfaction dimensions of the system usability are increased (Nielsen, 1994) while 
also providing an enhanced learning experience for the visitor. 



The language of cinematography (Metz, 1974), including shot segmentation, camera 
movements and transition effects, is employed in order to plan the animation and to 
synchronize the visual and the verbal parts of the presentation. In building the 
animations, a set of strategies similar to those used in documentaries was thus 
employed. Two broad classes of strategies have been identified. The first class 
encompasses constraints imposed by the grammar of cinematography, while the 
second deals with conventions normally used in guiding camera movements in the 
production of documentaries. For instance, a strategy in the first class would 
discourage a zoom-in immediately followed by a zoom-out, while a different strategy 
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in the second class would recommend the use of sequential scene cuts, rather than a 
fade-out effect, to visually enumerate different characters in a scene. It is worth noting 
that in the latter strategy it is often necessary to make reference to the discourse 
structure of the audio part of the presentation, such as enumeration of properties, 
background knowledge, and elaboration of related information. In order to formally 
use discourse structure, we employ the Rhetorical Structure Theory (Mann and 
Thompson, 1987). 

At present, we have completed a first prototype of a multimedia guide that employs 
cinematic techniques in presenting information for a fresco at Torre Aquila in Trento, 
Italy. A Web-based simulation of the multimedia guide can be seen at 
http:// peach.itc.it/preview.html . 

The next section briefly discusses the issues in designing a multimedia museum 
guide. Section 3 introduces the features of the Torre Aquila prototype. Sections 4 and 
5 present the theoretical background, discussing respectively the relevant concepts 
for the language of cinematography and the Rhetorical Structure Theory to analyze 
dependency relationships inside a text. Section 6 illustrates the strategies used in our 
multimedia guide to produce effective and pleasant video clips starting from audio 
commentaries. Finally, in section 7, the results of a preliminary evaluation are 
presented and discussed. 

2. The Museum as a Smart Environment 

Museums and cultural heritage institutions recreate an environment conducive to 
exploring not only the exhibited objects and works of art, but also new ideas and 
experiences. Visitors are free to move around and learn concepts, inquire and even 
apply what is leaned to their own worldview. A museum visit is thus a personal 
experience encompassing both cognitive aspects, such as the elaboration of 
background and new knowledge, and emotional aspects that may include the 
satisfaction of interests or the fascination with the exhibit itself. Despite the inherently 
stimulating environment created by cultural heritage institutions, on their own they 
often fall short of successfully supporting conceptual learning. Inquiry-skill-building, 
analytic experiences or follow-up activities at home or the school (Semper and 
Spasojevic, 2002). 

The optimal multimedia tourist guide should support strong personalization of all the 
information provided in a museum in an effort to ensure that each visitor be allowed 
to accommodate and interpret the visit according to his own pace and interests. 
Simultaneously, a museum guide should also provide the appropriate amount of 
impetus to foster learning and self-development so as to create a richer and more 
meaningful experience. 

In order to achieve the above goals it is necessary that the Information be presented 
in a manner that is appropriate to the physical location of the visitor as well as to the 
location of the works of art within the environment. Smoothly connecting the 
information found in an exhibit and presenting it to the visitor in a flexible yet coherent 
manner with respect to his physical location can mazimize the overall experience and 
absorption of new information for the viewer (Stock and Zancanaro, 2002). In other 
words, if the information is provided in a manner that flows and relates pieces to each 
other, this process in and of itself can aid in stimulating the visitor’s interest and, 
hence, desire to inquire, analyze and learn. This idea relates to the concept of 
situation-aware content, where information is most effective if presented in a cohesive 
way, building on previously delivered information. This may be accomplished by using 
comparisons and references to space and time, which in turn may aid the visitor in 
becoming oriented within the museum as well as across the various works of art. 

The ideal audio guide should not only guess what the visitors are interested in, but 
also take into consideration what they have to learn: orienting visitors, providing 
opportunities for reflection and allowing them to explore related ideas, thereby greatly 
enhancing the visit’s educational value. In essence, the guide should stimulate new 
interests and suggest new paths for exploring the museum. A system that supports 
visitors in their visit should take into account their agenda, expectations and interests 
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as well as the peculiarities of a cultural experience in a physical environment. 

It Is essential to also consider the Importance of creating an overall experience that 
truly addresses the needs of a person visiting a museum. This requires not only 
providing the visitor with a vast amount of information, even if wonderfully presented, 
but also allowing the person to spend a pleasurable and entertaining time at the 
exhibit. The concept of the immersive environment addresses the importance of 
creating a technology that supports rather then overwhelms the real experience of 
visiting a museum. A museum guide of this nature must be able to create a balance 
in terms of attention required from the visitor, also allowing time to be spent enjoying 
the "romance" of the cultural heritage institution and the works found therein. 

These and other challenges come into play when designing a system for the 
entertainment and edutainment of museum visitors. Creating an electronic tourist 
guide that transforms the user experience from one of simple consultation (commonly 
achieved with audio guides, multimedia kiosks, CDROMs or even books) to an 
Immersion into a rich information environment indeed requires a careful examination 
of all the abovementloned factors, while also considering input from visitors 
themselves. Difficulties arise when observing that such systems are not intended to 
help users perform specific work-related tasks, and most of the time they cannot be 
brought back to clearly stated user requirements. Keeping in mind that the ultimate 
goal of a guide is to engage the user and to stimulate learning, it becomes clear that 
the nature of this kind of system imposes a balance between the designer's vision 
and user needs. 

Using animations or video clips enhance the richness of the interaction though these 
may also distract the user by calling attention to the device rather than to the exhibit 
itself. Our hypothesis is that, on the contrary, a carefully planned video clip describing 
the exhibit will actually help the visitor quickly localize the details of the painting as 
well as aid the flow of the presentation by illustrating the relationship between new 
and already presented information. 

3. The Prototype at Torre Aquiia 

We have applied the idea of using cinematic techniques for presenting details of 
artworks in a prototype of a multimedia guide for Torre Aquiia' a tower at the 
Buonconsiglio Castle in Trento, where a fresco called "The Cycle of the Months", a 
masterpiece of the gothic period, is found. This fresco, painted In the Fifteenth 
Century, illustrates the activities of aristocrats and peasants throughout the year. The 
fresco is composed of eleven panels, each one representing one month (the month of 
March was destroyed over time) and occupies the four walls of the tower (see figure 
1). 
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Figure 1: Torre AquUa and the grid of infrared 



Our multimedia guide, implemented with Macromedia Flash on a PDA, detects the 
position of the visitor by means of infrared emitters placed in front of each panel. 
Interaction with the system is both proposed by the system itself and accepted by the 
user, thus sharing the responsibility of information access. When the system detects 
that the visitor is in front of one of the four walls, a picture of that wall is displayed on 
the PDA and, after a few seconds, if the user has not changed position, that panel is 
highlighted (see figure 2). At this point, the visitor can click on the panel and receive a 
multimedia presentation of the panel chosen. 




Figure 2: Snapshots of the muitimedia guide iocaiizing the user. 

The multimedia presentation is composed of an audio commentary accompanied by a 
sequence of images that appear on the PDA display and help the visitor quickly 
identify the fresco's details mentioned in the commentary. For Instance, when a 
specific detail of the panel is explained by the audio, the PDA may display or highlight 
that detail, thus quickly calling the attention of the user to the area in question. 

During the presentation, the PDA displays a VCR-style control panel and a slide bar 
to signal the length of the video clip and its actual position (see figure 3). At any given 
moment, the user is free to pause, fast forward, rewind and even stop the 
presentation by tapping on the appropriate control panel button. In this manner, the 
visitor is able to control the speed as well as the information itself, while also revisiting 
sections found most Interesting. 
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Figure 3: Snapshot of the multimedia guide playing a video ciip.4. 
The Language of Cinematography 

According to Metz {1974), cinematic representation is not like a human language that 
is defined by a set of grammatical rules; it is nevertheless guided by a set of generally 
accepted conventions. These guidelines may be used for developing multimedia 
presentations that can be best perceived by the viewer. In the following, we briefly 
summarize the basic terminology of cinematography. In section 6 we will discuss how 
these conventions can be expressed both in terms of constraints on camera 
movements and in terms of strategies related to the discourse structure of the 
associated audio commentary. 

4.1 Shot and camera movements 

The shot is the basic unit of a video sequence. In the field of cinematography a shot 
is defined as a continuous view from a single camera without interruption. Since we 
only deal with still images, we define a shot as a sequence of camera movements 
applied to the same image. 

The basic camera movements are pan, from "panorama", a rotation of the camera 
along the x-axis, tilt, a rotation along the y-axis, and dolly, a rotation along the z-axis. 

4.2 Transition effects 

Transitions among shots are considered the punctuation symbols of cinematography; 
they affect the rhythm of the discourse and the message conveyed by the video. 
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The main transitions used are cut fade, and cross fade, A cut occurs when the last 
frame of a shot is immediateiy repiaced by the first frame of the foilowing shot. A fade 
occurs when one shot graduaiiy repiaces another one, either by disappearing (fade 
out) or by being repiaced by the new shot (fade in). A particuiar case of a fade 
happens when instead of two shots, there is one shot and a black screen that can be, 
again, faded in or faded out. Finaiiy, a cross fade (also called dissolve) occurs when 
two shots are gradually superimposed during the moment when one is faded out 
while the other is faded in. 

5. Rhetorical Structure Theory 

Rhetorical Structure Theory (Mann and Thompson, 1987) analyses discourse 
structure in terms of dependency trees, with each node of the tree being a segment of 
text. Each branch of the tree represents the relationship between two nodes, where 
one node is called the nucleus and the other is called the satellite. The information in 
the satellite relates to that found in the nucleus in that It expresses an idea related to 
what was said in the nucleus. This rhetorical relation specifies the coherence relation 
that exists between the two portions of text contained In the nodes. For example, a 
Cause rhetorical relation holds when the satellite describes the event that caused 
what is contained in the nucleus. Figure 4 shows an example of a rhetorical tree. 

Here the second paragraph provides background Information with respect to the 
content expressed in the first paragraph. This additional Information acts as a sort of 
reinforcement for what was previously said in the first paragraph and consequently 
facilitates the absorption of information. In the original formulation by Mann and 
Thompson, the theory posited twenty different rhetorical relations between a satellite 
and a nucleus, while other scholars have since added to this theory. 



At the bottom on the 
right is blacksmith''3 
workshop, a plebian 
andUiesiB to the 
toumanient going on in 
tlie upper part of the 
painting which is 
dtiefly an aristocratic 
activitv. 



backgriound 




Tlie choice of a 
touniament for the 
month of February is 
related to the joust and 
revelries that took place 
in carnival time: 



Figure 4: An example of a rhetorical tree (simplified). 

RST was originally developed as part of work carried out in the computer-based text 
generation field. In a previous work (Not and Zancanaro, 2001), we described a set of 
techniques to dynamically compose adaptive presentations of artworks from a 
repository of multimedia data annotated with rhetorical relations. These techniques 
have been exploited in an audio-based, location-aware adaptive audio guide 
described in Not et al., (2000). The audio commentaries produced by this audio guide 
are automatically annotated with the rhetorical structure. In the next section we will 
discuss how this information can be used to create more effective video clips to 
accompany the commentary. 

6. Video Clips on Still Images 
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Video clips are built by first searching for the sequence of details mentioned in the 
audio commentary, deciding the segmentation In shots, and then planning the 
camera movements In order to smoothly focus on each detail In synchrony with the 
verbal part. 

In building a video clip, a set of strategies similar to those used In documentaries is 
employed. Two broad classes of strategies have been Identified. The first class 
encompasses constraints Imposed by the grammar of cinematography, while the 
second deals with conventions normally used in guiding camera movements in the 
production of documentaries. 

While the constraints are just sequence of forbidden camera movements, the 
conventions are expressed in terms of rhetorical structures found in the audio 
commentary. In our view, the verbal part of the documentary always drives the visual 
part. 

6.1 Constraints on Camera Movements 

In order to ensure a pleasant presentation, constraints on camera movements have 
to be imposed. For example, a pan from right to left forbids a subsequent pan from 
left to right. In general, applying any given movement (pan, tilt and zoom) and then 
immediately reapplying it on the reverse direction is discouraged because this action 
renders the video uncomfortable to watch. 

Given that the audio commentary drives the visual part, it is often the case that such 
forbidden combinations of camera movements are required. In these cases, two tricks 
can be applied: either choosing a different way of focusing the detail required by the 
verbal part; for example a zoom out can often effectively replace a pan, or starting a 
new shot altogether. In the latter case, the two shots should be linked by a transition 
effect that suggests continuity, such as a short fade. 

Rhetorical Strategies 

Constraints on camera movements alone are sufficient to ensure a pleasant 
presentation, yet they do not impact the effectiveness of the video clip. In order to 
have a more engaging presentation, the visual part should not only focus on the right 
detail at the right time, but also support the presentation of new audio information by 
illustrating its relation to information that has been already given. In this manner, 
continuity between the pieces of information is built, and in turn facilitates the viewing 
of the video clip while stimulating the absorption of new information. 

The text in figure 5 can be visually represented with two shots of the same Image 
(that is, the tournament) linked by a long cross fade. Technically, having two shots is 
not necessary, since the image is the same, but the cross fade helps the user 
understand that background information is going to be provided. The first Image is 
thus presented while the first paragraph is heard over the audio; then when the audio 
switches to, in this case, the background information, the Image is enlarged to cover 
the entire panel and finally is refocused on the detail once the audio has stopped. 
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Figure 5: The "Tournament" example: from the text to the video clip. 

A rhetorical strategy suggests, on the basis of a rhetorical tree configuration, what 
shot segmentation and which transition effect should be applied. The strategies 
employed in the Torre Aquila multimedia guide were elicited by a focus group activity 
with a documentary director. 

7. Preliminary Evaluation 

A formal evaluation of the prototype will start next March at Torre Aquila. Preliminary 
studies and pilot tests show encouraging results and interesting effects. 

All users became acquainted with the system very quickly. Most of them used the 
PDA as a ”3D mouse”, pointing directly to the infrared emitters to speed up the 
localization. Future investigations will evaluate how users can be more directly 
involved in the process of localization. 

Most of the users complained before actually using the system that a video sequence 
on a PDA would distract their attention from the real artwork. After a short Interaction 
with the system, however, they appreciated the possibility of quickly localizing small 
details on the fresco. This demonstrates that use of cinematic techniques in a 
multimedia guide can be effective, particularly in explaining complex painting. The 
different effects that the verbal and the visual parts of the presentation have on the 
user’s attention are yet to be investigated. 

8. Conclusion 

This paper discussed how cinematic techniques can be used in a multimedia 
museum guide to provide more pleasant and effective presentation of information. 
Video clips are built by first searching for the sequence of details mentioned in the 
audio commentary, deciding the segmentation in shots, and then planning the 
camera movements soas to smoothly focus on each detail in synchrony with the 
verbal counterpart, in our approach, the verbal part always drives the visual part. 

The video clips are built accordingly to two broad classes of strategies. The first class 
encompasses constraints imposed by the grammar of cinematography, while the 
second deals with conventions normally used in guiding camera movements in the 
production of documentaries. 
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While the constraints are just a sequence of forbidden camera movements, the 
conventions are expressed in terms of rhetorical structures found In the audio 
commentary. By coupilng these cinematic techniques into organized guideiines, the 
creation of multimedia video clips can vastiy help to Improve quality as well as the 
effectiveness of the presentations. A visitor to a museum can thus benefit from an 
automatic guide that causes mlnimai interference with the enjoyment and learning 
experience provided by an exhibit. 

As a case study, a multimedia museum guide for Torre Aquila in Trento has been 
presented and the results of a preliminary evaluation have been discussed 
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